On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

نویسندگان

Jie Tang

Pieter Abbeel

چکیده

Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U(θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U(θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Gradients for CVaR-Constrained MDPs

We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along t...

متن کامل

Simulation-Based Radar Detection Methods

In this paper, radar detection based on Monte Carlo sampling is studied. Two detectors based on Importance Sampling are presented. In these detectors, called Particle Detector, the approximated likelihood ratio is calculated by Monte Carlo sampling. In the first detector, the unknown parameters are first estimated and are substituted in the likelihood ratio (like &#10the GLRT method). In the s...

متن کامل

Simulation-Based Radar Detection Methods

متن کامل

Missing and Noisy Data in Nonlinear Time-Series Prediction

Comment added in October, 2003: This paper is now of mostly historical importance. At the time of publication (1995) it was one of the first machine learning papers to stress the importance of stochastic sampling in time-series prediction and time-series model learning. In this paper we suggested to use Gibbs sampling (Section 4), nowadays particle filters are commonly used instead. Secondly, t...

متن کامل

Importance Sampling for Markov Chains: Asymptotics for the Variance

In this paper, we apply the Perron-Frobenius theory for non-negative matrices to the analysis of variance asymptotics for simulations of finite state Markov chain to which importance sampling is applied. The results show that we can typically expect the variance to grow (at least) exponentially rapidly in the length of the time horizon simulated. The exponential rate constant is determined by t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

نویسندگان

چکیده

منابع مشابه

Policy Gradients for CVaR-Constrained MDPs

Simulation-Based Radar Detection Methods

Simulation-Based Radar Detection Methods

Missing and Noisy Data in Nonlinear Time-Series Prediction

Importance Sampling for Markov Chains: Asymptotics for the Variance

عنوان ژورنال:

اشتراک گذاری